Goto

Collaborating Authors

 dqp model


Learning to Select Goals in Automated Planning with Deep-Q Learning

Núñez-Molina, Carlos, Fernández-Olivares, Juan, Pérez, Raúl

arXiv.org Artificial Intelligence

In this work we propose a planning and acting architecture endowed with a module which learns to select subgoals with Deep Q-Learning. This allows us to decrease the load of a planner when faced with scenarios with real-time restrictions. We have trained this architecture on a video game environment used as a standard test-bed for intelligent systems applications, testing it on different levels of the same game to evaluate its generalization abilities. We have measured the performance of our approach as more training data is made available, as well as compared it with both a state-of-the-art, classical planner and the standard Deep Q-Learning algorithm. The results obtained show our model performs better than the alternative methods considered, when both plan quality (plan length) and time requirements are taken into account. On the one hand, it is more sample-efficient than standard Deep Q-Learning, and it is able to generalize better across levels. On the other hand, it reduces problem-solving time when compared with a state-of-the-art automated planner, at the expense of obtaining plans with only 9% more actions.


Goal Reasoning by Selecting Subgoals with Deep Q-Learning

Núñez-Molina, Carlos, Nikolov, Vladislav, Vellido, Ignacio, Fernández-Olivares, Juan

arXiv.org Artificial Intelligence

In this work we propose a goal reasoning method which learns to select subgoals with Deep Q-Learning in order to decrease the load of a planner when faced with scenarios with tight time restrictions, such as online execution systems. We have designed a CNN-based goal selection module and trained it on a standard video game environment, testing it on different games (planning domains) and levels (planning problems) to measure its generalization abilities. When comparing its performance with a satisfying planner, the results obtained show both approaches are able to find plans of good quality, but our method greatly decreases planning time. We conclude our approach can be successfully applied to different types of domains (games), and shows good generalization properties when evaluated on new levels (problems) of the same game (domain).